The dataset includes:

  • Year: 1995-2024
  • Sports_Facilities: Total number of sports facilities
  • People_Doing_Sports_K: Number of people doing sports (in thousands)
In [17]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns



df=pd.read_csv("idman_dataset.csv")

df.head()




df.isnull().sum()


df.columns = ['Year', 'Unnamed', 'Sports_Facilities', 'People_Doing_Sports_K']

df = df.drop('Unnamed', axis=1)


print(df.isnull().sum())
        
df = df.sort_values('Year')

df
Year                     0
Sports_Facilities        0
People_Doing_Sports_K    0
dtype: int64
Out[17]:
Year Sports_Facilities People_Doing_Sports_K
29 1995 6670.0 394.1
28 1996 6937.0 298.7
27 1997 6885.0 353.4
26 1998 7238.0 329.6
25 1999 7864.0 352.6
24 2000 7908.0 355.2
23 2001 8129.0 400.5
22 2002 8214.0 410.3
21 2003 8940.0 539.6
20 2004 8948.0 546.5
19 2005 8732.0 529.8
18 2006 9000.0 538.4
17 2007 9323.0 539.3
16 2008 9604.0 542.6
15 2009 9623.0 1617.4
14 2010 9491.0 1649.8
13 2011 9954.0 1660.4
12 2012 10259.0 1678.4
11 2013 10574.0 1685.1
10 2014 10798.0 1723.8
9 2015 11027.0 1724.7
8 2016 11215.0 1755.4
7 2017 11412.0 1785.9
6 2018 11545.0 1827.9
5 2019 11674.0 1864.8
4 2020 11770.0 1861.6
3 2021 11915.0 1897.6
2 2022 12156.0 1918.8
1 2023 12270.0 1921.6
0 2024 12290.0 1898.8

I have dropped the unnamed column because it does have nan elements and i dont think it is necessary

In [21]:
plt.figure(figsize=(10, 6))
plt.plot(df['Year'], df['Sports_Facilities'])
plt.title('Number of Sports Facilities Over Time')
plt.xlabel('Year')
plt.ylabel('Number of Facilities')
plt.grid(True)
plt.show()
No description has been provided for this image

The number of sports facilities has grown from 6,670 in 1995 to 12,290 in 2024, representing an 84% increase over 30 years. Growth has been consistent with no major drops.

In [22]:
fig, ax1 = plt.subplots(figsize=(12, 6))


ax1.plot(df['Year'], df['Sports_Facilities'], 'b-o')
ax1.set_xlabel('Year')
ax1.set_ylabel('Sports Facilities', color='blue')


ax2 = ax1.twinx()
ax2.plot(df['Year'], df['People_Doing_Sports_K'], 'r-s')
ax2.set_ylabel('People(thousands)', color='red')

plt.title('Sports Facilities vs Participation Over Time')
plt.show()
No description has been provided for this image
In [19]:
# Make your charts prettier with seaborn
sns.set_style("whitegrid")
# Recreate your charts with seaborn styling
In [20]:
import plotly.graph_objects as go
import plotly.express as px

# Create interactive dual-axis chart
fig = go.Figure()

fig.add_trace(go.Scatter(
    x=df['Year'], 
    y=df['Sports_Facilities'],
    name='Sports Facilities',
    mode='lines+markers'
))

fig.add_trace(go.Scatter(
    x=df['Year'], 
    y=df['People_Doing_Sports_K'],
    name='People Doing Sports (K)',
    mode='lines+markers',
    yaxis='y2'
))

fig.update_layout(
    title='Interactive: Sports Facilities vs Participation',
    yaxis=dict(title='Sports Facilities'),
    yaxis2=dict(title='People (thousands)', overlaying='y', side='right')
)

fig.show()

There is an anomaly in 2005-2010 which is number of people doing sports are skyrocketed in that period

In [24]:
plt.figure(figsize=(8, 6))
sns.scatterplot(x='Sports_Facilities', y='People_Doing_Sports_K', data=df, s=100)
plt.title('Correlation: Facilities vs Participation')
correlation = df['Sports_Facilities'].corr(df['People_Doing_Sports_K'])
plt.text(0.05, 0.95, f'Correlation: {correlation:.3f}', 
         transform=plt.gca().transAxes, fontsize=12)
plt.show()
No description has been provided for this image

Correlation Scatter Plot

In [25]:
plt.figure(figsize=(6, 4))
sns.heatmap(df[['Sports_Facilities', 'People_Doing_Sports_K']].corr(), 
            annot=True, cmap='coolwarm', center=0)
plt.title('Correlation Heatmap')
plt.show()
No description has been provided for this image
In [26]:
df['Facilities_Growth'] = df['Sports_Facilities'].pct_change() * 100
df['Participation_Growth'] = df['People_Doing_Sports_K'].pct_change() * 100

plt.figure(figsize=(12, 5))
plt.plot(df['Year'], df['Facilities_Growth'], label='Facilities Growth %', marker='o')
plt.plot(df['Year'], df['Participation_Growth'], label='Participation Growth %', marker='s')
plt.title('Year-over-Year Growth Rates')
plt.xlabel('Year')
plt.ylabel('Growth Rate (%)')
plt.legend()
plt.axhline(y=0, color='black', linestyle='--', alpha=0.3)
plt.grid(True)
plt.show()
No description has been provided for this image
In [27]:
df['Facilities_per_1K'] = df['Sports_Facilities'] / df['People_Doing_Sports_K']

fig = px.line(df, x='Year', y='Facilities_per_1K', 
              title='Sports Facilities per 1000 Participants',
              markers=True)
fig.update_layout(yaxis_title='Facilities per 1000 People')
fig.show()

1. Steady Infrastructure Growth¶

Azerbaijan showed consistent commitment with 84% growth in sports facilities from 6,670 (1995) to 12,290 (2024), averaging 2.5% annual growth.

2. The 2008 Data Anomaly¶

A dramatic 198% jump in participation occurred between 2008-2009 (539K → 1,617K), while facilities grew only 2.9%. This indicates a methodology change in how participants were counted, making pre- and post-2008 data incomparable.

Conclusion¶

Key Findings:

  • ✅ 30 years of consistent infrastructure investment
  • ⚠️ 2008 methodology change complicates trend analysis
  • 📈 Strong correlation between facilities and participation
In [ ]: